Frequent Sets Mining in Data Stream Environments
نویسندگان
چکیده
In recent years, data streams have emerged as a new data type that has attracted much attention from the data mining community. They arise naturally in a number of applications (Brian et al., 2002), including financial service (stock ticker, financial monitoring), sensor networks (earth sensing satellites, astronomic observations), web tracking and personalization (webclick streams). These stream applications share three distinguishing characteristics that limit the applicability of most traditional mining algorithms (Minos et al., 2002; Pedro and Geoff, 2001): (1) the continuous arrival rate of the stream is high and unpredictable; (2) the volume of data is unbounded, making it impractical to store the entire content of the stream; (3) in terms of practical applicability, stream mining results are often expected to be closely approximated the exact results as well as to be available at any time. Consequently, the main challenge in mining data streams is to develop effective algorithms that support the processing of stream data in one-pass manner (preferably on-line) whilst operating under system resources limitations (e.g., memory space, CPU cycles or bandwidth). This chapter discusses the above challenge in the context of finding frequent sets from transactional data streams. The problems will be presented and some effective methods, both from deterministic and probabilistic approaches, are reviewed in details. The tradeoffs between memory space and accuracy of mining results are also discussed. Furthermore, the problems will be considered in three fundamental mining models for stream environments: landmark window, forgetful window and sliding window models.
منابع مشابه
Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملMining Data Stream for Load Shedding
Data stream is continuous flow of data, which necessitates load shedding for data stream processing system. Here we study overload handling for frequent pattern mining indata streams. Here in this paper load shedding use frequent pattern matching algorithm i.e priority, transaction and attribute in overload situation. The heavy workload or continues stream of the mining algorithm lies mostly in...
متن کاملAn Efficient One-pass Method for Discovering Bases of Recently Frequent Episodes over Online Data Streams
The knowledge embedded in an online data stream is likely to change over time due to the dynamic evolution of the stream. Consequently, in frequent episode mining over an online stream, frequent episodes should be adaptively extracted from recently generated stream segments instead of the whole stream. However, almost all existing frequent episode mining approaches find episodes frequently occu...
متن کاملEfficient Graph Structure for the Mining of Frequent Itemsets from Data Streams
In this paper, we propose a graph structure which captures important data streams. This graph can be easily maintained and mined for frequent item sets as well as various other patterns like constrained item sets. This graph captures the contents of transaction in a window and arranges nodes according to some canonical order that is unaffected by changes in item frequency. This graph structure ...
متن کاملFrequent Itemset Mining Using Rough-Sets
Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cro...
متن کامل